Search CORE

359 research outputs found

Fine-tuning Language Models for Factuality

Author: Finn Chelsea
Manning Christopher D.
Mitchell Eric
Tian Katherine
Yao Huaxiu
Publication venue
Publication date: 14/11/2023
Field of study

The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively

arXiv.org e-Print Archive

Search and Rescue under the Forest Canopy using Multiple UAVs

Author: Allen Danette
How Jonathan P.
Liu Katherine
Ok Kyel
Roy Nicholas
Tian Yulun
Tran Loc
Publication venue
Publication date: 07/06/2020
Field of study

We present a multi-robot system for GPS-denied search and rescue under the forest canopy. Forests are particularly challenging environments for collaborative exploration and mapping, in large part due to the existence of severe perceptual aliasing which hinders reliable loop closure detection for mutual localization and map fusion. Our proposed system features unmanned aerial vehicles (UAVs) that perform onboard sensing, estimation, and planning. When communication is available, each UAV transmits compressed tree-based submaps to a central ground station for collaborative simultaneous localization and mapping (CSLAM). To overcome high measurement noise and perceptual aliasing, we use the local configuration of a group of trees as a distinctive feature for robust loop closure detection. Furthermore, we propose a novel procedure based on cycle consistent multiway matching to recover from incorrect pairwise data associations. The returned global data association is guaranteed to be cycle consistent, and is shown to improve both precision and recall compared to the input pairwise associations. The proposed multi-UAV system is validated both in simulation and during real-world collaborative exploration missions at NASA Langley Research Center.Comment: IJRR revisio

arXiv.org e-Print Archive

DSpace@MIT

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

Author: Finn Chelsea
Manning Christopher D.
Mitchell Eric
Rafailov Rafael
Sharma Archit
Tian Katherine
Yao Huaxiu
Zhou Allan
Publication venue
Publication date: 24/05/2023
Field of study

A trustworthy real-world prediction system should be well-calibrated; that is, its confidence in an answer is indicative of the likelihood that the answer is correct, enabling deferral to a more expensive expert in cases of low-confidence predictions. While recent studies have shown that unsupervised pre-training produces large language models (LMs) that are remarkably well-calibrated, the most widely-used LMs in practice are fine-tuned with reinforcement learning with human feedback (RLHF-LMs) after the initial unsupervised pre-training stage, and results are mixed as to whether these models preserve the well-calibratedness of their ancestors. In this paper, we conduct a broad evaluation of computationally feasible methods for extracting confidence scores from LLMs fine-tuned with RLHF. We find that with the right prompting strategy, RLHF-LMs verbalize probabilities that are much better calibrated than the model's conditional probabilities, enabling fairly well-calibrated predictions. Through a combination of prompting strategy and temperature scaling, we find that we can reduce the expected calibration error of RLHF-LMs by over 50%

arXiv.org e-Print Archive

Dickkopf-related protein 1 (Dkk1) regulates the accumulation and function of myeloid derived suppressor cells in cancer

Author: Ali Zamani
Andrea Wang-Gillam
Aude-Hélène Capietto
Barcellos-Hoff
Biancamaria Ricci
Bindea
Capietto
David B. Bumpass
David G. DeNardo
Fowler
Gabitass
Gabrilovich
Gabrilovich
Galon
Hamanishi
Katherine Weilbaecher
Li
Liu
Lucia D’Amico
MacDonald
Mace
Mahmoud
Melissa Meyer
Mitchem
Mundy-Bosse
Pinzone
Porembka
Roberta Faccio
Räsänen
Sahil Mahajan
Schreiber
Sheila A. Stewart
Tian
Trikha
Weilbaecher
Xinming Su
Yaccoby
Yamabuki
Yang
Zhang
Zhang
Zheng
Zhengfeng Yang
Zhu
Publication venue: Digital Commons@Becker
Publication date: 01/01/2016
Field of study

Tumor–stroma interactions contribute to tumorigenesis. Tumor cells can educate the stroma at primary and distant sites to facilitate the recruitment of heterogeneous populations of immature myeloid cells, known as myeloid-derived suppressor cells (MDSCs). MDSCs suppress T cell responses and promote tumor proliferation. One outstanding question is how the local and distant stroma modulate MDSCs during tumor progression. Down-regulation of β-catenin is critical for MDSC accumulation and immune suppressive functions in mice and humans. Here, we demonstrate that stroma-derived Dickkopf-1 (Dkk1) targets β-catenin in MDSCs, thus exerting immune suppressive effects during tumor progression. Mice bearing extraskeletal tumors show significantly elevated levels of Dkk1 in bone microenvironment relative to tumor site. Strikingly, Dkk1 neutralization decreases tumor growth and MDSC numbers by rescuing β-catenin in these cells and restores T cell recruitment at the tumor site. Recombinant Dkk1 suppresses β-catenin target genes in MDSCs from mice and humans and anti-Dkk1 loses its antitumor effects in mice lacking β-catenin in myeloid cells or after depletion of MDSCs, demonstrating that Dkk1 directly targets MDSCs. Furthermore, we find a correlation between CD15(+) myeloid cells and Dkk1 in pancreatic cancer patients. We establish a novel immunomodulatory role for Dkk1 in regulating tumor-induced immune suppression via targeting β-catenin in MDSCs

Crossref

Digital Commons@Becker

PubMed Central

Prediction of cardiovascular outcomes with machine learning techniques: application to the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study.

Author: Brewster Pamela
Chen Tian
Cooper Christopher J
Cutlip Donald E
D\u27Agostino Ralph B
Dworkin Lance D
Greco Barbara A
Henrich William
Jamerson Kenneth
Massaro Joseph M
Murphy Timothy P
Pencina Karol
Shapiro Joseph I
Steffes Michael
Tobe Sheldon
Tuttle Katherine
Publication venue: Providence St. Joseph Health Digital Commons
Publication date: 01/01/2019
Field of study

Background: Data derived from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study were analyzed in an effort to employ machine learning methods to predict the composite endpoint described in the original study. Methods: We identified 573 CORAL subjects with complete baseline data and the presence or absence of a composite endpoint for the study. These data were subjected to several models including a generalized linear (logistic-linear) model, support vector machine, decision tree, feed-forward neural network, and random forest, in an effort to attempt to predict the composite endpoint. The subjects were arbitrarily divided into training and testing subsets according to an 80%:20% distribution with various seeds. Prediction models were optimized within the CARET package of R. Results: The best performance of the different machine learning techniques was that of the random forest method which yielded a receiver operator curve (ROC) area of 68.1%±4.2% (mean ± SD) on the testing subset with ten different seed values used to separate training and testing subsets. The four most important variables in the random forest method were SBP, serum creatinine, glycosylated hemoglobin, and DBP. Each of these variables was also important in at least some of the other methods. The treatment assignment group was not consistently an important determinant in any of the models. Conclusion: Prediction of a composite cardiovascular outcome was difficult in the CORAL population, even when employing machine learning methods. Assignment to either the stenting or best medical therapy group did not serve as an important predictor of composite outcome. Clinical Trial Registration: ClinicalTrials.gov, NCT00081731

Providence St. Joseph Health Digital Commons

Collectivity evolution in the neutron-rich Pd isotopes toward the N=82 shell closure

Author: Aoi Nori
Baba H.
Cáceres Lucia
Dombrádi Zsolt
Doornenbal Pieter
Kobayashi K.
Kondo Yosuke
Lee J.
Li Katherine
Liu H.
Matsushita Masafumi
Minakata Ryogo
Motobayashi Tohru
Nishimura Daiki
Otsu Hideaki
Sakaguchi Satoshi
Sakurai H.
Scheit Heiko
Sohler Dóra
Steppenbeck David
Sun Y.
Takeuchi Satoshi
Tanaka Ryuki
Tian Z.
Togano Yasuhiro
Vajta Zsolt
Wang H.
Yamamoto Tetsuya
Yang Zaihong
Ye Y.
Yokoyama Rin
Yoneda K.
Publication venue: 'American Physical Society (APS)'
Publication date: 27/02/2020
Field of study

University of Debrecen Electronic Archive